Author
Correspondence author
Computational Molecular Biology, 2026, Vol. 16, No. 2
Received: 02 Feb., 2026 Accepted: 08 Mar., 2026 Published: 21 Mar., 2026
Maize yield prediction plays an essential role in ensuring food security and promoting sustainable agricultural management. This study explores a prediction framework based on soil nutrient characteristics and climate variables to improve the accuracy and reliability of maize yield estimation. Key soil indicators, including nitrogen, phosphorus, potassium, organic matter, and pH value, were combined with climate factors such as temperature, precipitation, and accumulated growing degree days. Multiple prediction models, including traditional statistical approaches, machine learning algorithms, and deep learning methods, were constructed and compared. The study further analyzed the interaction effects between soil and climate variables and evaluated model performance using indicators such as RMSE, MAE, and R². A regional case study was conducted to verify the applicability and robustness of the proposed framework. The results demonstrate that integrating soil nutrient and climate data can significantly enhance maize yield prediction accuracy and provide valuable support for precision agriculture, crop management, and agricultural decision-making.
1 Introduction
Global demand for maize is rising steadily as it underpins food, feed, and industrial supply chains, yet production is increasingly constrained by climate variability and degraded soils. Temperature extremes, altered rainfall, and declining soil fertility jointly threaten yield stability, especially in regions already facing food insecurity. Improving the accuracy of maize yield prediction by explicitly linking soil nutrients with key climate variables is therefore essential for optimizing fertilization, managing risk, and designing climate‑smart production systems. Maize yields respond strongly to interactions between climate conditions and soil nutrient status. Studies in sub‑Saharan Africa and China show that nitrogen (N), phosphorus (P), and potassium (K) inputs can buffer or amplify the impacts of changing CO₂, temperature, and rainfall on yield, and that soil indigenous nutrients strongly modulate yield losses under warming (Falconnier et al., 2020. Long‑term experiments further indicate that soil fertility improvements (e.g., higher total and available N and P) enhance yield stability and sustainability, while climate warming tends to reduce yields where soil fertility is low. At the same time, nutrient management alone is insufficient; integrating soil, climate, and management information is needed to maintain productivity under ongoing climate change (Ocwa et al., 2023). In this context, a predictive framework that couples soil nutrient properties with climate variables can support more precise fertilizer recommendations, reduce environmental risks, and improve resilience of maize‑based systems.
Internationally, two main directions have emerged. First, process‑based crop models are used to simulate maize yield responses to climate scenarios and N management, revealing strong interactions between N inputs, soil N dynamics, and climate drivers in both low‑input and intensive systems (Falconnier et al., 2020). Second, data‑driven approaches, especially machine learning (ML) and deep learning (DL), increasingly predict crop yields from large datasets combining soil, climate, and management information. Systematic reviews show that temperature, rainfall, soil type, soil nutrients, and vegetation indices are among the most frequently used predictors, and that algorithms such as Random Forest (RF), Support Vector Machines, Artificial Neural Networks, CNNs, and LSTMs dominate recent work. For maize specifically, RF models trained on multi‑year field trials in Ghana identified soil properties (e.g., organic carbon, total N, exchangeable bases) and maximum temperature as the most important predictors of yield, surpassing purely climatic models and improving understanding of nutrient-climate interactions (Asamoah et al., 2024). Related studies using RF and other ML algorithms have shown that including both soil and weather variables substantially improves prediction of maize yield under zero N fertilization and in drought‑stressed environments. These advances highlight the potential of combining soil nutrient information with climate variables in robust predictive frameworks, but also reveal gaps: many models rely on limited nutrient descriptors, treat climate and soil separately, or focus on short time periods and narrow environments.
Building on this progress, the present study focuses on prediction of maize yield based explicitly on soil nutrient status and climate variables, aiming to better capture their joint effects. The main research contents are: (1) construction of a comprehensive feature set describing soil nutrients (e.g., N, P, K, organic matter, pH and related properties) and key climate factors (temperature, precipitation, radiation, humidity) relevant to maize growth; (2) development and comparison of data‑driven yield prediction models, with emphasis on ensemble methods such as Random Forest and other ML/DL techniques that have shown strong performance in crop yield prediction; and (3) quantitative analysis of variable importance and interaction patterns between soil nutrients and climate variables, to identify critical drivers of yield variation and potential leverage points for management. The technical route begins with data collection and preprocessing, including quality control and normalization of soil and climate data. Next, the dataset is split into training and testing subsets, and multiple candidate models are trained, tuned, and evaluated using metrics such as coefficient of determination (R²) and root mean square error (RMSE), following best practices from recent ML yield‑prediction studies. Finally, model interpretation techniques (e.g., variable importance analysis and partial response analysis) are applied to quantify how specific combinations of soil nutrients and climate variables influence predicted maize yield, providing both a practical prediction tool and theoretical insight for nutrient management and climate adaptation strategies.
Across diverse environments, maize yield is jointly controlled by soil nutrient status and climate conditions, and their interaction largely determines both productivity and stability. While process‑based models and ML/DL approaches have advanced yield prediction, there remains a need for models that explicitly integrate detailed soil nutrient descriptors with key climate variables and provide interpretable guidance for management. This study addresses that gap by constructing and evaluating data‑driven maize yield prediction models grounded in soil-climate interactions, aiming to support more precise fertilization, risk management, and climate‑smart maize production.
2 Analysis of Factors Influencing Maize Yield
2.1 Mechanism of soil nutrients on maize growth
Maize yield is jointly controlled by soil nutrient supply and climate conditions throughout the growing season. Understanding how these drivers act individually and in combination is essential for reliable yield prediction and targeted management. Adequate N, P, and K fertilization strongly enhances maize growth traits such as plant height, leaf area, cob number, and grain weight, which together raise biomass accumulation and grain yield by large margins compared with unfertilized controls (Kaleri et al., 2026). Long‑term NPK application improves key soil properties-including soil organic carbon and available N, P, and K-which in turn explain a larger share of yield variation than phenological factors in the North China Plain (Wang et al., 2024).
2.2 Effects of climate factors on maize yield
Temperature, precipitation, drought, and vapor pressure deficit (VPD) strongly shape maize yield anomalies at regional to global scales. Temperature‑related extremes generally show stronger associations with yield deviations than precipitation alone, although irrigation can partially buffer high‑temperature damage (Figure 1) (Vogel et al., 2019). In Northeast China, compound drought and heat cause greater yield loss than either stress alone, with warm‑dry years producing the largest reductions and yield loss increasing with temperature and VPD but decreasing with precipitation (Li et al., 2021).
Figure 1 Climate extreme drivers of maize yield anomalies at regional to global scales |
2.3 Synergistic mechanism of soil and climate factors
Soil fertility and climate interact to determine both average yield and its stability over time. Long‑term experiments show that balanced NPK fertilization not only raises mean maize yield but also improves the stability of relative yield anomalies, while models that combine climate variables with nutrient status explain far more variation in yield anomalies than climate alone (Zhu et al., 2024). In diverse maize systems, soil moisture and temperature jointly drive yield damage, and predictions that include both components outperform those relying only on temperature and precipitation, underscoring the tight soil-climate coupling.
3 Data Sources and Overview of the Study Area
3.1 Natural and agricultural conditions of the study area
The major maize-producing regions of northern and northeastern China are characterized by temperate monsoon climates with distinct growing seasons, where temperature, precipitation, and sunshine jointly determine maize climate suitability at different phenological stages (Wang et al., 2024). In the Northeast, relatively cooler temperatures and variable rainfall make precipitation a key limiting factor, while temperature plays a stronger role in the suitability index than in more southerly zones. In contrast, the Huang-Huai-Hai (3H) region has warmer average temperatures and generally higher comprehensive climate suitability, although spatial differences in precipitation and sunshine still create heterogeneous yield potentials. Across China’s broader maize belt, temperature variability and climate perturbations can cause substantial yield losses, especially under warming, but these impacts are spatially heterogeneous (Chen et al., 2024).
3.2 Data sources and acquisition methods
Maize yield data and associated environmental variables can be obtained from long-term field trials, experimental stations, and statistical records, often at plot or county scales. Multi-year experiments in Northeast China and the North China Plain provide detailed measurements of yield, phenology, and management, suitable for evaluating soil-climate interactions and model performance. In some studies, plot-scale experiments under different fertilization or tillage systems supply yield and soil measurements across contrasting climate conditions, enabling analysis of management impacts on yield and soil properties (Meng et al., 2021; Qian et al., 2025). For broader regional coverage, station networks combining agronomic records with local weather observations support large-scale assessments of yield responses to climate variability and soil attributes.
3.3 Data preprocessing and quality control
Prior to model construction, environmental and yield data require systematic preprocessing to ensure completeness and consistency. Weather station data are screened for missing values, range violations, and temporal or spatial inconsistencies, often using automated quality-control algorithms tailored to agricultural decision needs. Such systems flag implausible measurements-e.g., unrealistic temperature sequences, saturated relative humidity at too low values, or anomalous rainfall series-enabling early detection and correction or removal of erroneous records. For gridded or satellite-based climate products, temporal aggregation (e.g., daily to monthly) and calculation of growing-season indices are performed to match crop growth stages and modeling time steps. Yield and management records are checked for outliers, coding errors, and inconsistent units across years and locations to avoid bias in training datasets (Archontoulis et al., 2020).
4 Construction and Selection of Feature Variables
4.1 Construction of soil nutrient indicator system
A scientific soil nutrient indicator system should reflect both the supply of key macronutrients and the broader edaphic conditions that control maize response. Long‑term omission experiments identify available and total N, P, and K, soil organic carbon, C:N and N:P ratios as primary determinants of yield and nutrient use efficiency, showing that edaphic indicators explain more yield variation than phenological factors in maize systems (Wang et al., 2024). Meta‑analysis in northern China further supports including soil organic matter, total N, and available P and K as core indicators, because these properties consistently increase under rational fertilization and are closely aligned with yield gains and water use efficiency (Jiang et al., 2024).
Figure 2 Spatial heterogeneity of soil nutrient limitations and their effects on maize yield |
4.2 Extraction of climate variable features
Climate feature construction should represent both mean conditions and stress events during sensitive growth stages. Studies that assessed the relevance of climatic attributes for corn yield found that solar radiation, precipitation, vapor pressure, and maximum and minimum temperature are among the most influential variables, with radiation slightly exceeding precipitation in importance in Neotropical environments (Sierra-Forero et al., 2024). Regional analyses that combine multiple climate time series with yield records confirm that temperature‑ and water‑related indicators together explain a large share of yield variability, especially when evaluated over the growing season (Luthra et al., 2024).
4.3 Feature selection and dimensionality reduction methods
High‑dimensional soil-climate datasets require effective feature selection (FS) to avoid overfitting and reduce computational cost. Reviews of machine‑learning yield models emphasize that optimal feature sets, obtained by FS, are essential because only a subset of soil, climate, and management variables truly drive prediction accuracy (Hara et al., 2021). In a dedicated framework for yield prediction, a Relief‑based FS step was combined with linear discriminant analysis feature extraction, before applying machine‑learning classifiers, which markedly improved accuracy over models using all raw variables (Gupta et al., 2022).
5 Methods for Prediction Model Construction
5.1 Traditional statistical modeling methods
Traditional statistical methods for yield prediction are mainly based on linear or polynomial relationships between yield and a limited set of explanatory variables, often weather indices. Multiple linear regression and its variants have long been used as benchmarks when comparing newer machine learning approaches for maize and other crops, typically using growing‑season temperature and precipitation plus a time trend to represent technological progress (Leng and Hall, 2020). Extensions such as quadratic, interaction, and polynomial regression have also been applied to maize and other cereals, and can achieve reasonable accuracy when relationships are approximately linear and the number of predictors is small (Shastry et al., 2017).
5.2 Machine learning modeling methods
Machine learning (ML) methods such as Random Forest (RF), Support Vector Regression, and boosted trees have become central to crop yield prediction because they capture non‑linear responses and interactions between soil, climate, and management variables without strict parametric assumptions. For maize, RF has been shown to outperform multiple linear regression at regional and global scales, reducing RMSE from 14-49% of mean yield with linear models to 6-14% with RF, and better reproducing spatial patterns of yield (Jeong et al., 2016). In the U.S. Midwest, a comparative study using Lasso, Support Vector Regressor, RF, and XGBoost with hundreds of environmental features found that XGBoost was the most accurate and stable algorithm for county‑level maize yield prediction (Kang et al., 2020).
5.3 Deep learning and ensemble learning methods
Deep learning (DL) extends ML by learning complex, hierarchical representations from large, high‑dimensional datasets composed of weather, soil, genotype, and remote sensing inputs. A deep neural network trained on thousands of maize hybrid trials across more than 2,000 locations substantially outperformed Lasso, shallow neural networks, and regression trees, reaching an RMSE close to 11-12% of average yield while also supporting feature selection to reduce input dimensionality with minimal accuracy loss (Khaki and Wang, 2019). However, DL does not always dominate: in a U.S. Midwest maize study, LSTM and CNN architectures did not surpass XGBoost, suggesting that tabular environmental datasets may not always benefit from image‑ or sequence‑oriented deep architectures (Kang et al., 2020).
6 Model Training and Evaluation System
6.1 Dataset partitioning and validation strategies
A reasonable partition of the maize yield dataset is the basis for constructing reliable prediction models. In most supervised learning settings, data are divided into training, validation, and test subsets so that model fitting, hyperparameter tuning, and final performance assessment can be clearly separated and avoid information leakage (Bischl et al., 2021). When the number of yearly observations is small, directly reserving an independent test set becomes difficult, and specialized cross‑validation (CV) schemes such as leave‑one‑out (LOO) or nested CV are recommended to obtain unbiased generalization estimates (Dinh and Aires, 2022).
6.2 Model parameter optimization methods
Hyperparameters of machine learning models, such as the number of trees in random forests or learning rates in gradient boosting, strongly influence predictive performance and must be tuned systematically rather than by ad‑hoc trial‑and‑error (Bischl et al., 2021). Classical search strategies include grid search and random search, which evaluate candidate configurations on resampling‑based performance estimates, but they become inefficient as the hyperparameter space grows.
6.3 Model evaluation indicator system
7 Case Study: Empirical Analysis of Regional Maize Yield Prediction
7.1 Study area and sample construction
In many recent maize yield prediction studies, the study area is defined to capture both environmental gradients and management diversity so that models generalize beyond a single field or season. For example, plot‑scale work integrates multi‑year trials under contrasting fertilizer systems, combining climate, soil, and satellite data to represent heterogeneous growing conditions across years and treatments (Meng et al., 2021). Similar multi‑farm designs in Western Australia aggregate yield monitor data from thousands of hectares over several seasons, then collocate each observation with soil, terrain, and weather variables to form a dense spatio‑temporal sample set (Filippi et al., 2019).
Figure 3 Workflow for integrating multi-source environmental and agricultural datasets into maize yield prediction samples |
7.2 Comparative Analysis Of Multi-Model Prediction Results
Comparative studies consistently show that model performance depends strongly on algorithm choice and input richness. At the plot scale, combining vegetation indices, climate, soil, and fertilizer data, Random Forest and Adaptive Boosting clearly outperform linear regression, SVM, GPR, and KNN, with R² often above 0.85 and lowest RMSE values (Meng et al., 2021). In a Hungarian field using detailed spatio‑temporal soil and micro‑relief measurements, XGBoost surpassed neural and kernel methods, reaching test accuracies above 95%, while lattice‑based smoothing further improved predictive AUC (Nyéki et al., 2021).
7.3 Result validation and agricultural application analysis
Robust validation is essential to ensure that multi‑model predictions have practical value. Studies highlight that naïve random data splits can substantially overestimate predictive skill, especially when the goal is true forecasting rather than interpolation within a season (Morales and Villalobos, 2023). More rigorous schemes, such as nested k‑fold cross‑validation across years and fields, or leave‑one‑field/leave‑one‑year‑out designs, better reflect operational performance and were used, for instance, in multi‑farm machine‑learning models and in Ghanaian RF models for maize yield and agronomic efficiency (Filippi et al., 2019; Asamoah et al., 2024).
8 Results Analysis and Discussion
8.1 Contribution analysis of soil nutrient variables
Feature-importance and interpretable ML studies highlight that specific soil nutrients can dominate maize yield responses, even in data‑rich settings. In a data‑intensive farm management trial, Random Forest analysis showed that urea application was consistently the most critical variable for explaining spatial yield variation, with soil phosphorus, pH, clay content, sodium and plant population also among the leading contributors in different seasons (Maseko et al., 2024). This indicates that both applied N and inherent soil fertility properties jointly control yield in high‑resolution, within‑field prediction. Similar work in precision agriculture, using RF and other models on over 145,000 corn and soybean yield observations, found that soil test P, K, Zn, soil organic matter and cation exchange capacity were key predictors, underscoring the strong explanatory power of nutrient and related soil indicators for yield variation at sub‑field scales (Burdett and Wellen, 2022).
Under nutrient‑limited conditions, omission trials combined with AutoML provide a more explicit decomposition of nutrient contributions. In 324 nutrient omission plot trials across ten agroecological zones in the Eastern Indo‑Gangetic Plains, stack‑ensemble and deep learning models predicted relative nutrient‑limited yields with low RMSE, and permutation importance identified soil pH as the dominant variable controlling N‑ and P‑limited yields (Ahmed et al., 2024). The same analysis showed that soil N and Zn strongly influenced Zn‑limited yield, while spatial trends in K‑limited yield emerged along an east-west gradient, revealing distinct fertility constraints for different nutrients. These findings suggest that soil nutrient variables-especially applied N, soil P, Zn, pH and texture‑related properties-provide high marginal gains in predictive power and are indispensable components of maize yield models based on soil-climate interactions.
8.2 Influence weight analysis of climate variables
Across diverse modeling frameworks, climate variables frequently emerge as the largest single contributors to interannual maize yield variability. A global meta‑analysis using 68 simulation studies for wheat, maize and rice showed that maximum temperature and precipitation significantly affected yield responses, with yields declining by 4.21% per 1 °C increase in maximum temperature but increasing by 0.43% per 1% rise in precipitation (Qin et al., 2023). This quantitative gradient highlights the high negative weight of heat stress and the compensating effect of adequate rainfall in crop‑climate response functions. At the global scale, mixed‑effects models updating projected yield responses under CMIP6 scenarios indicate that temperature‑related stress is a dominant driver of future maize yield losses, with projected global maize declines around 22% by late century under high emissions if adaptation is limited (Li et al., 2025).
Machine‑learning-based attribution provides more detailed rankings of individual climate indicators. A hybrid GGCM-Random Forest framework for China’s maize belt found that chilling days, drought indicators and crop pests/diseases were the main factors influencing projected maize yield changes, with relative importance quantified via RF partial‑dependence analysis (Li et al., 2023). In a separate process‑based and ML study on wheat under future climate scenarios, precipitation explained most yield variability in mid‑century high‑emission conditions, whereas maximum temperature became the dominant limiting factor under later, more strongly warmed scenarios (El-Mahroug et al., 2025). For site‑specific maize prediction with spatio‑temporal XGBoost models, precipitation during the juvenile growth phase (May) was identified as the single most important factor over five years, followed by soil pH, clay content, electrical conductivity and NDVI, again emphasizing the high influence weight of water‑related variables alongside key soil properties.
8.3 Discussion on model applicability and uncertainty
The applicability of soil‑nutrient- and climate‑based yield models depends critically on how uncertainty is handled across space, time and scenario conditions. A recent meta‑analysis of crop yield responses to projected climate change combined mixed‑effects modeling with block bootstrapping to partition uncertainty arising from model structure, climate projections (CMIP6) and emissions pathways, showing that simple pooled OLS tends to underestimate yield losses and under‑represent uncertainty ranges (Li et al., 2025). Similarly, a crop‑model and ML ensemble for maize and soybean across China demonstrated that coupling GGCMs with Random Forest greatly improved correlation (r up to 0.77 for maize) and reduced normalized RMSE, while variance decomposition revealed that the dominant uncertainty source shifted from crop models in the baseline GGCM runs to global climate models and then scenarios as projections extended further into the century (Li et al., 2023). These results imply that model applicability under future climates requires explicit accounting for structural, climate and scenario uncertainties rather than relying on single‑model projections.
Transferability across domains and scales introduces additional uncertainty dimensions for data‑driven yield models. Domain‑adaptation work on maize in the US Corn Belt, using DANN, KLIEP and RTNN, found that models trained in temperate regions with medium-high growing degree days and moderate vapor pressure deficit generalized well, whereas strong dependence on vegetation indices (GCI) reduced transferability when source and target domains had limited overlap (Priyatikanto et al., 2023). Independent evaluations of cross‑validation strategies in UAV‑based yield prediction further showed that random CV can substantially overestimate performance when models are applied outside their training spatial domain, whereas spatial or leave‑one‑field‑out CV and simpler, regularized models gave more realistic extrapolation accuracy (Habibi et al., 2024). Together with county‑scale ensemble studies that link large prediction errors to low cropland ratios and extreme weather events (Sajid et al., 2022), these findings stress that robust maize yield prediction demands careful validation design, domain‑aware training, and transparent uncertainty quantification before models are applied for management or policy decisions in new regions or under novel climate conditions.
9 Conclusions and Future Research Directions
Existing studies confirm that integrating soil nutrients, soil physical properties, and climate variables can explain a substantial share of maize yield variability across diverse agroecological zones. Soil indicators such as nitrogen fertilizer rate, soil organic carbon, pH, bulk density, and exchangeable bases consistently emerge among the most influential predictors, often exceeding the importance of individual climate variables for yield prediction in tropical and semi-arid environments. At the same time, temperature, rainfall, and related weather indices remain key drivers of interannual variation, especially when combined with management and genotype information in large datasets. From a modeling perspective, tree-based and boosting algorithms (Random Forest, XGBoost, Gradient Boosting) generally outperform linear methods and many deep architectures for maize yield prediction using soil-climate feature sets. Meta‑modeling of process-based simulations and large empirical trial datasets shows that these methods can achieve relative errors around 10-15% when sufficient training samples and well-designed features are available. Systematic reviews across maize and other crops further indicate that these algorithms are among the most frequently adopted and robust options, particularly when coupled with feature engineering and multimodal data integration.
High‑accuracy soil-climate yield models provide actionable information for fertilizer management and nutrient efficiency. In Ghana, Random Forest and XGBoost models trained on long‑term maize trials successfully predicted both yield and agronomic efficiency, highlighting nitrogen rate, rainfall, and key soil properties as dominant management levers. Such models support the design of site‑specific recommendations that can raise productivity while reducing the environmental costs of blanket fertilizer application. Similar ML-process‑model hybrids using APSIM outputs demonstrate that meta‑models can rapidly explore genotype-environment-management scenarios for preseason planning. At larger scales, integrating soil maps, meteorological series, and satellite indicators enables early‑season forecasts that outperform conventional statistical baselines and even some official forecasts. County‑level yield prediction in the U.S. Midwest has shown that XGBoost models using hundreds of environmental features can provide reliable maize forecasts several months before harvest, improving on models based only on basic weather or historical yields. Reviews of precision agriculture emphasize that such predictive systems contribute to resource optimization, risk management, and food‑security planning by linking sensing technologies, big data platforms, and advanced analytics into operational decision support tools.
Despite these advances, several limitations constrain the reliability and transferability of current soil-climate yield models. Studies comparing algorithms against simple baselines show that, under realistic forecasting setups using ordered train-test splits, ML models sometimes offer only modest gains over farm‑level average yields, especially when weather forecast errors are ignored. Systematic reviews also highlight persistent challenges with obtaining high‑quality, harmonized datasets on soil nutrients, management, and high‑resolution yields, which can limit model generalization across regions and seasons. In addition, many models are trained and validated under random data partitioning, leading to over‑optimistic performance estimates for true out‑of‑sample prediction. Future research directions point toward hybrid, transferable, and explainable frameworks. Hybrid models that couple process‑based crop simulators with ML or deep learning have improved accuracy and reduced uncertainty in semi‑arid maize systems, particularly when fusing remote sensing, climate, and soil information. Domain adaptation and transfer‑learning approaches, including partial adversarial networks, are beginning to address domain shifts between ecological zones and could substantially improve cross‑regional maize yield prediction. Reviews stress the need for standardized data protocols, interpretable architectures (e.g., SHAP‑ or XAI‑enhanced models), and scalable, crop‑agnostic pipelines so that soil nutrient and climate‑based yield prediction can be robustly embedded in precision agriculture and sustainability strategies.
Acknowledgments
We would like to thank the anonymous reviewers for their detailed review of the draft. Their specific feedback helped us correct the logical loopholes in our arguments.
Conflict of Interest Disclosure
The authors affirm that this research was conducted without any commercial or financial relationships that could be construed as a potential conflict of interest.
Abdel-Salam M., Kumar N., and Mahajan S., 2024, A proposed framework for crop yield prediction using hybrid feature selection approach and optimized machine learning, Neural Computing and Applications, 36: 20723-20750.
https://doi.org/10.1007/s00521-024-10226-x
Aghighi H., Azadbakht M., Ashourloo D., Shahrabi H., and Radiom S., 2018, Machine learning regression techniques for the silage maize yield prediction using time-series images of Landsat 8 OLI, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 11: 4563-4577.
https://doi.org/10.1109/jstars.2018.2823361
Ahmed Z., Krupnik T., Timsina J., Islam S., Hossain K., Kurishi A., Emran S., Harun-Ar-Rashid M., McDonald A., and Gathala M., 2024, Prediction of spatial heterogeneity in nutrient-limited sub-tropical maize yield: implications for precision management in the eastern indo-gangetic plains, Artificial Intelligence in Agriculture, 12: 1-15.
https://doi.org/10.1016/j.aiia.2024.08.001
Archontoulis S., Castellano M., Licht M., Nichols V., Baum M., Huber I., Martinez-Feria R., Puntel L., Ordóñez R., Iqbal J., Wright E., Dietzel R., Helmers M., Vanloocke A., Liebman M., Hatfield J., Herzmann D., Córdova S., Edmonds P., Togliatti K., Kessler A., Danalatos G., Pasley H., Pederson C., and Lamkey K., 2020, Predicting crop yields and soil‐plant nitrogen dynamics in the US Corn Belt, Crop Science, 60: 721-738.
https://doi.org/10.1002/csc2.20039
Asamoah E., Heuvelink G., Chairi I., Bindraban P., and Logah V., 2024, Random forest machine learning for maize yield and agronomic efficiency prediction in Ghana, Heliyon, 10: e37065.
https://doi.org/10.1016/j.heliyon.2024.e37065
Bischl B., Binder M., Lang M., Pielok T., Richter J., Coors S., Thomas J., Ullmann T., Becker M., Boulesteix A., Deng D., and Lindauer M., 2021, Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 13(2): e1484.
https://doi.org/10.1002/widm.1484
Burdett H., and Wellen C., 2022, Statistical and machine learning methods for crop yield prediction in the context of precision agriculture, Precision Agriculture, 23: 1553-1574.
https://doi.org/10.1007/s11119-022-09897-0
Chai T., and Draxler R., 2014, Root mean square error (RMSE) or mean absolute error (MAE)? - Arguments against avoiding RMSE in the literature, Geoscientific Model Development, 7: 1247-1250.
https://doi.org/10.5194/gmd-7-1247-2014
Chen F., Xu X., Chen S., Wang Z., Wang B., Zhang Y., Zhang C., Feng P., and Hu K., 2024, Soil buffering capacity enhances maize yield resilience amidst climate perturbations, Agricultural Systems, 222: 103870.
https://doi.org/10.1016/j.agsy.2024.103870
Chen X., Wang L., Niu Z., Zhang M., Li C., and Li J., 2020, The effects of projected climate change and extreme climate on maize and rice in the Yangtze River Basin, China, Agricultural and Forest Meteorology, 282-283: 107867.
https://doi.org/10.1016/j.agrformet.2019.107867
Chicco D., Warrens M., and Jurman G., 2021, The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation, PeerJ Computer Science, 7: e623.
https://doi.org/10.7717/peerj-cs.623
Dandrifosse S., Jago A., Huart J., Michaud V., Planchon V., and Rosillon D., 2024, Automatic quality control of weather data for timely decisions in agriculture, Smart Agricultural Technology, 8: 100445.
https://doi.org/10.1016/j.atech.2024.100445
Diaz-Gonzalez F., Vuelvas J., Correa C., Vallejo V., and Patiño D., 2022, Machine learning and remote sensing techniques applied to estimate soil indicators - Review, Ecological Indicators, 135: 108517.
https://doi.org/10.1016/j.ecolind.2021.108517
Dinh T., and Aires F., 2022, Nested leave-two-out cross-validation for the optimal crop yield model selection, Geoscientific Model Development, 15: 3519-3536.
https://doi.org/10.5194/gmd-15-3519-2022
El-Mahroug S., Suleiman A., Zoubi M., Al-Omari S., Abu-Afifeh Q., Al-Jawaldeh H., Alta’any Y., Al-Nawaiseh T., Obeidat N., Alsoud S., Alshoshan A., Al-Shibli F., and Ta’any R., 2025, Predictive modeling of climate-driven crop yield variability using DSSAT towards sustainable agriculture, AgriEngineering, 7(5): 156.
https://doi.org/10.3390/agriengineering7050156
Falconnier G., Corbeels M., Boote K., Affholder F., Adam M., MacCarthy D., Ruane A., Nendel C., Whitbread A., Justes É., Ahuja L., Akinseye F., Alou I., Amouzou K., Anapalli S., Baron C., Basso B., Baudron F., Bertuzzi P., Challinor A., Chen Y., Deryng D., Elsayed M., Faye B., Gaiser T., Galdos M., Gayler S., Gérardeaux E., Giner M., Grant B., Hoogenboom G., Ibrahim E., Kamali B., Kersebaum K., Kim S., Laan M., Leroux L., Lizaso J., Maestrini B., Meier E., Mequanint F., Ndoli A., Porter C., Priesack E., Ripoche D., Sida T., Singh U., Smith W., Srivastava A., Sinha S., Tao F., Thorburn P., Timlin D., Traoré B., Twine T., and Webber H., 2020, Modelling climate change impacts on maize yields under low nitrogen input conditions in sub‐Saharan Africa, Global Change Biology, 26: 5942-5964.
https://doi.org/10.1111/gcb.15261
Feng P., Wang B., Harrison M., Wang J., Liu K., Huang M., Liu D., Yu Q., and Hu K., 2022, Soil properties resulting in superior maize yields upon climate warming, Agronomy for Sustainable Development, 42(5): 81.
https://doi.org/10.1007/s13593-022-00818-z
Filippi P., Jones E., Wimalathunge N., Somarathna P., Pozza L., Ugbaje S., Jephcott T., Paterson S., Whelan B., and Bishop T., 2019, An approach to forecast grain crop yield using multi-layered, multi-farm data sets and machine learning, Precision Agriculture, 20(5): 1015-1029.
https://doi.org/10.1007/s11119-018-09628-4
Gupta S., Geetha A., Sankaran K., Zamani A., Ritonga M., Raj R., Ray S., and Mohammed H., 2022, Machine learning- and feature selection-enabled framework for accurate crop yield prediction, Journal of Food Quality, 2022: 6293985.
https://doi.org/10.1155/2022/6293985
Habibi L., Matsui T., and Tanaka T., 2024, Critical evaluation of the effects of a cross-validation strategy and machine learning optimization on the prediction accuracy and transferability of a soybean yield prediction model using UAV-based remote sensing, Journal of Agriculture and Food Research, 18: 101096.
https://doi.org/10.1016/j.jafr.2024.101096
Hara P., Piekutowska M., and Niedbała G., 2021, Selection of independent variables for crop yield prediction using artificial neural network models with remote sensing data, Land, 10(6): 609.
https://doi.org/10.3390/land10060609
Jeong J., Resop J., Mueller N., Fleisher D., Yun K., Butler E., Timlin D., Shim K., Gerber J., Reddy V., and Kim S., 2016, Random forests for global and regional crop yield predictions, PLoS ONE, 11(6): e0156571.
https://doi.org/10.1371/journal.pone.0156571
Jiang M., Dong C., Bian W., Zhang W., and Wang Y., 2024, Effects of different fertilization practices on maize yield, soil nutrients, soil moisture, and water use efficiency in northern China based on a meta-analysis, Scientific Reports, 14: 57031.
https://doi.org/10.1038/s41598-024-57031-z
Kaleri A., Khanzada B., Rajput W., Bijarani A., Shafqat A., Arain A., Mirbahar S., Jokhio N., Majeedano A., and Majeedano S., 2026, Combined effects of nitrogen, phosphorus, and potassium on maize growth, development, and yield, Jammu Kashmir Journal of Agriculture, 5(3): 297-305.
https://doi.org/10.56810/jkjagri.005.03.0297
Kang Y., Ozdogan M., Zhu X., Ye Z., Hain C., and Anderson M., 2020, Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest, Environmental Research Letters, 15(6): 064005.
https://doi.org/10.1088/1748-9326/ab7df9
Khaki S., and Wang L., 2019, Crop yield prediction using deep neural networks, Frontiers in Plant Science, 10: 621.
https://doi.org/10.3389/fpls.2019.00621
Kim K., and Lee B., 2023, Effects of climate change and drought tolerance on maize growth, Plants, 12(20): 3548.
https://doi.org/10.3390/plants12203548
Kuradusenge M., Hitimana E., Hanyurwimfura D., Rukundo P., Mtonga K., Mukasine A., Uwitonze C., Ngabonziza J., and Uwamahoro A., 2023, Crop yield prediction using machine learning models: Case of Irish potato and maize, Agriculture, 13(1): 225.
https://doi.org/10.3390/agriculture13010225
Leng G., and Hall J., 2020, Predicting spatial and temporal variability in crop yields: An inter-comparison of machine learning, regression and process-based models, Environmental Research Letters, 15(4): 044027.
https://doi.org/10.1088/1748-9326/ab7b24
Li C., Camac J., Robinson A., and Kompas T., 2025, Predicting changes in agricultural yields under climate change scenarios and their implications for global food security, Scientific Reports, 15: 87047.
https://doi.org/10.1038/s41598-025-87047-y
Li E., Zhao J., Pullens J., and Yang X., 2021, The compound effects of drought and high temperature stresses will be the main constraints on maize yield in Northeast China, Science of the Total Environment, 812: 152461.
https://doi.org/10.1016/j.scitotenv.2021.152461
Li L., Zhang Y., Wang B., Feng P., He Q., Shi Y., Liu K., Harrison M., Liu D., Yao N., Li Y., He J., Feng H., Siddique K., and Yu Q., 2023, Integrating machine learning and environmental variables to constrain uncertainty in crop yield change projections under climate change, European Journal of Agronomy, 151: 126917.
https://doi.org/10.1016/j.eja.2023.126917
Li Y., Guan K., Yu A., Peng B., Zhao L., Li B., and Peng J., 2019, Toward building a transparent statistical model for improving crop yield prediction: Modeling rainfed corn in the U.S., Field Crops Research, 234: 55-65.
https://doi.org/10.1016/j.fcr.2019.02.005
Li Z., Ding L., and Xu D., 2022, Exploring the potential role of environmental and multi-source satellite data in crop yield prediction across Northeast China, Science of the Total Environment, 806: 152880.
https://doi.org/10.1016/j.scitotenv.2021.152880
Luthra N., Srivastava A., Shahi U., Singh V., Dey P., and Singh A., 2024, Prediction of post-harvest soil nutrient status through multiple linear regression for targeted yield of hybrid maize, Indian Journal of Agronomy, 68(4): 547-553.
https://doi.org/10.59797/ija.v68i4.5471
Maseko S., Van Der Laan M., Tesfamariam E., Delport M., and Otterman H., 2024, Evaluating machine learning models and identifying key factors influencing spatial maize yield predictions in data intensive farm management, European Journal of Agronomy, 160: 127193.
https://doi.org/10.1016/j.eja.2024.127193
Matiu M., Ankerst D., and Menzel A., 2017, Interactions between temperature and drought in global and regional crop yield variability during 1961-2014, PLoS One, 12(5): e0178339.
https://doi.org/10.1371/journal.pone.0178339
Medina H., and Tian D., 2023, Synergistic contributions of climate and management intensifications to maize yield trends from 1961 to 2017, Environmental Research Letters, 18(3): 034021.
https://doi.org/10.1088/1748-9326/acb27f
Meng L., Liu H., Ustin S., and Zhang X., 2021, Predicting maize yield at the plot scale of different fertilizer systems by multi-source data and machine learning methods, Remote Sensing, 13(18): 3760.
https://doi.org/10.3390/rs13183760
Morales A., and Villalobos F., 2023, Using machine learning for crop yield prediction in the past or the future, Frontiers in Plant Science, 14: 1128388.
https://doi.org/10.3389/fpls.2023.1128388
Nyéki A., Kerepesi C., Daróczy B., Benczúr A., Milics G., Nagy J., Harsányi E., Kovács A., and Neményi M., 2021, Application of spatio-temporal data in site-specific maize yield prediction with machine learning methods, Precision Agriculture, 22: 1397-1415.
https://doi.org/10.1007/s11119-021-09833-8
Ocwa A., Harsányi E., Széles A., Holb I., Szabó S., Rátonyi T., and Mohammed S., 2023, A bibliographic review of climate change and fertilization as the main drivers of maize yield: Implications for food security, Agriculture and Food Security, 12(1): 19.
https://doi.org/10.1186/s40066-023-00419-3
Oikonomidis A., Catal C., and Kassahun A., 2022, Hybrid deep learning-based models for crop yield prediction, Applied Artificial Intelligence, 36(1): 2031823.
https://doi.org/10.1080/08839514.2022.2031823
Pham H., Awange J., and Kuhn M., 2022, Evaluation of three feature dimension reduction techniques for machine learning-based crop yield prediction models, Sensors, 22(17): 6609.
https://doi.org/10.3390/s22176609
Priyatikanto R., Lu Y., Dash J., and Sheffield J., 2023, Improving generalisability and transferability of machine-learning-based maize yield prediction model through domain adaptation, SSRN Electronic Journal, 1: 1-29.
https://doi.org/10.2139/ssrn.4122021
Probst P., Wright M., and Boulesteix A., 2018, Hyperparameters and tuning strategies for random forest, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 9(3): e1301.
https://doi.org/10.1002/widm.1301
Qian Y., Zhang Z., Jiang F., Wang J., Dong F., Liu J., and Peng X., 2025, Impacts of tillage treatments on soil physical properties and maize growth at two sites under different climatic conditions in black soil region of Northeast China, Soil and Tillage Research, 257: 106471.
https://doi.org/10.1016/j.still.2025.106471
Qin M., Zheng E., Hou D., Meng X., Meng F., Gao Y., Chen P., Qi Z., and Xu T., 2023, Response of wheat, maize, and rice to changes in temperature, precipitation, CO2 concentration, and uncertainty based on crop simulation approaches, Plants, 12(14): 2709.
https://doi.org/10.3390/plants12142709
Radočaj D., Plaščak I., and Jurišić M., 2025, A comparative assessment of regular and spatial cross-validation in subfield machine learning prediction of maize yield from Sentinel-2 phenology, Eng, 6(10): 270.
https://doi.org/10.3390/eng6100270
Satpathi A., Setiya P., Das B., Nain A., Jha P., Singh S., and Singh S., 2023, Comparative analysis of statistical and machine learning techniques for rice yield forecasting for Chhattisgarh, India, Sustainability, 15(3): 2786.
https://doi.org/10.3390/su15032786
Shahhosseini M., Hu G., Khaki S., and Archontoulis S., 2021, Corn yield prediction with ensemble CNN-DNN, Frontiers in Plant Science, 12: 709008.
https://doi.org/10.3389/fpls.2021.709008
Shastry A., Sanjay H., and Bhanusree E., 2017, Prediction of crop yield using regression techniques, International Journal of Computing, 6(5): 1-5.
Sierra-Forero B., Barón-Velandia J., and Vanegas-Ayala S., 2024, Assessment of the relevance of features associated with corn crop yield prediction in Colombia, a country in the Neotropical zone, International Journal of Information Technology, 16: 2129-2138.
https://doi.org/10.1007/s41870-024-01762-9
Sun Z., Yang R., Wang J., Zhou P., Gong Y., Gao F., and Wang C., 2024, Effects of nutrient deficiency on crop yield and soil nutrients under winter wheat-summer maize rotation system in the North China Plain, Agronomy, 14(11): 2690.
https://doi.org/10.3390/agronomy14112690
Sweet L., Müller C., Anand M., and Zscheischler J., 2023, Cross-validation strategy impacts the performance and interpretation of machine learning models, Artificial Intelligence for the Earth Systems, 2(4): e230026.
https://doi.org/10.1175/aies-d-23-0026.1
Vashisth A., and Aravind K., 2026, Maize yield estimation at different growth stage using weather variables by LASSO, elastic net and stepwise multiple linear regression techniques, Scientific Reports, 16: 34239.
https://doi.org/10.1038/s41598-025-34239-1
Vogel E., Donat M., Alexander L., Meinshausen M., Ray D., Karoly D., Meinshausen N., and Frieler K., 2019, The effects of climate extremes on global agricultural yields, Environmental Research Letters, 14(5): 054010.
https://doi.org/10.1088/1748-9326/ab154b
Wang N., Ai Z., Zhang Q., Leng P., Qiao Y., Li Z., Tian C., Cheng H., Chen G., and Li F., 2024, Impacts of nitrogen (N), phosphorus (P), and potassium (K) fertilizers on maize yields, nutrient use efficiency, and soil nutrient balance: Insights from a long-term diverse NPK omission experiment in the North China Plain, Field Crops Research, 317: 109616.
https://doi.org/10.1016/j.fcr.2024.109616
Wang X., Li X., Lou Y., You S., and Zhao H., 2024, Refined evaluation of climate suitability of maize at various growth stages in major maize-producing areas in the North of China, Agronomy, 14(2): 344.
https://doi.org/10.3390/agronomy14020344
Wang Y., Shen Y., Yu S., Zhang X., and Xiao D., 2025, Climate extremes are critical to maize yield and will be severer in North China, Climate Risk Management, 47: 100710.
https://doi.org/10.1016/j.crm.2025.100710
Wu J., Chen X., Zhang H., Xiong L., Lei H., and Deng S., 2019, Hyperparameter optimization for machine learning models based on Bayesian optimization, Journal of Electronic Science and Technology, 17(1): 26-40.
https://doi.org/10.11989/jest.1674-862x.80904120
Yang J., Yang J., Liu S., and Hoogenboom G., 2014, An evaluation of the statistical methods for testing the performance of crop models with observed data, Agricultural Systems, 127: 81-89.
https://doi.org/10.1016/j.agsy.2014.01.008
Zhao F., Wang G., Li S., Hagan D., and Ullah W., 2023, The combined effects of VPD and soil moisture on historical maize yield and prediction in China, Frontiers in Environmental Science, 11: 1117184.
https://doi.org/10.3389/fenvs.2023.1117184
Zhu W., Rezaei E., Sun Z., Wang J., and Siebert S., 2024, Soil-climate interactions enhance understanding of long-term crop yield stability, European Journal of Agronomy, 160: 127386.
https://doi.org/10.1016/j.eja.2024.127386

. FPDF(win)
. FPDF(mac)
. HTML
. Online fPDF
Associated material
. Readers' comments
Other articles by authors
. Jinhua Cheng
. Wei Wang
Related articles
. Maize yield prediction
. Soil nutrients
. Climate variables
. Machine learning
. Precision agriculture
Tools
. Post a comment
.png)
.png)
.png)